Selectivity Estimation Without the Attribute Value Independence Assumption
نویسندگان
چکیده
The result size of a query that involves multiple attributes from the same relation depends on these attributes’joinr data distribution, i.e., the frequencies of all combinations of attribute values. To simplify the estimation of that size, most commercial systems make the artribute value independenceassumption and maintain statistics (typically histograms) on individual attributes only. In reality, this assumption is almost always wrong and the resulting estimations tend to be highly inaccurate. In this paper, we propose two main alternatives to effectively approximate (multi-dimensional) joint data distributions. (a) Using a multi-dimensional histogram, (b) Using the Singular Value Decomposition (SVD) technique from linear algebra. An extensive set of experiments demonstrates the advantages and disadvantages of the two approaches and the benefits of both compared to the independence assumption.
منابع مشابه
Lightweight Graphical Models for Selectivity Estimation Without Independence Assumptions
As a result of decades of research and industrial development, modern query optimizers are complex software artifacts. However, the quality of the query plan chosen by an optimizer is largely determined by the quality of the underlying statistical summaries. Small selectivity estimation errors, propagated exponentially, can lead to severely sub-optimal plans. Modern optimizers typically maintai...
متن کاملThe effects of the violation of local independence assumption on the person measures under the Rasch model
Local independence of test items is an assumption in all Item Response Theory (IRT) models. That is, the items in a test should not be related to each other. Sharing a common passage, which is prevalent in reading comprehension tests, cloze tests and C-Tests, can be a potential source of local item dependence (LID). It is argued in the literature that LID results in biased parameter estimation ...
متن کاملAttribute Value Weighted Average of One-Dependence Estimators
Of numerous proposals to improve the accuracy of naive Bayes by weakening its attribute independence assumption, semi-naive Bayesian classifiers which utilize one-dependence estimators (ODEs) have been shown to be able to approximate the ground-truth attribute dependencies; meanwhile, the probability estimation in ODEs is effective, thus leading to excellent performance. In previous studies, OD...
متن کاملSelectivity Estimation for Exclusive Query Translation in Deep Web Data Integration
In Deep Web data integration, some Web database interfaces express exclusive predicates of the form Qe = Pi(Pi ∈ P1, P2, . . . , Pm), which permits only one predicate to be selected at a time. Accurately and efficiently estimating the selectivity of each Qe is of critical importance to optimal query translation. In this paper, we mainly focus on the selectivity estimation on infinite-value attr...
متن کاملBeyond Independence: Conditions for the Optimality of the Simple Bayesian Classi er
The simple Bayesian classi er (SBC) is commonly thought to assume that attributes are independent given the class, but this is apparently contradicted by the surprisingly good performance it exhibits in many domains that contain clear attribute dependences. No explanation for this has been proposed so far. In this paper we show that the SBC does not in fact assume attribute independence, and ca...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997